Yichu Chen, Gu Gong, Kate Jones, Gianni Spiga
2022-11-29
Sexually transmitted diseases are infections that are spread during vaginal, oral, or anal intercourse. Although sometimes undetected, STDs can cause serious health problems in individuals and lead to reproductive issues. Here, we examine two types of bacterial infections that can be easily treated once diagnosed. In order to understand the factors that influence the chance of reinfection and hopefully decrease the cases in high-risk populations, the following data was analyzed. Time to reinfection is studied for three different groups: those infected with gonorrhea, those infected with chlamydia, and those infected with both. We also analyze various predictors to see if they significantly influence the survival probability.
The predictors are as follows:
If we can understand the factors that lead to an increased risk of reinfection, we can utilize targeted preventive care and hopefully reduce the number of individuals who become infected.
Our first step is to visualize a few of the variables in our data:
Below we create two pie charts. The first shows the percentages for each type of initial infection. The second shows the percentages for the number of partners each patient had.
Roughly 45% of the patients were initially infected by chlamydia, 16% of the patients were initially infected by gonorrhea; 39% of the patients were infected by both types of bacteria to start with. It looks like that infection by chlamydia was more commonly seen than infection by gonorrhea.
About 70% of the patients reported to have 1 sex partner; 16.6% of them had 2 partners; 8% had no sex partner; it is very rare to have 3 or more sex partners; still, it is worth notice that one of the patients reported to have 19 sex partners.
## Call:
## survdiff(formula = surv_object ~ iinfct, data = std)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## iinfct=gonorrhea 140 73 54.5 6.28042 7.50617
## iinfct=chlamydia 396 135 153.0 2.12201 3.81136
## iinfct=both 341 139 139.5 0.00166 0.00278
##
## Chisq= 8.5 on 2 degrees of freedom, p= 0.01
## Call:
## coxph(formula = surv_object ~ iinfct, data = std)
##
## n= 877, number of events= 347
##
## coef exp(coef) se(coef) z Pr(>|z|)
## iinfctchlamydia -0.4202 0.6569 0.1457 -2.884 0.00393 **
## iinfctboth -0.2980 0.7423 0.1450 -2.055 0.03984 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## iinfctchlamydia 0.6569 1.522 0.4937 0.8741
## iinfctboth 0.7423 1.347 0.5587 0.9863
##
## Concordance= 0.524 (se = 0.016 )
## Likelihood ratio test= 7.93 on 2 df, p=0.02
## Wald test = 8.37 on 2 df, p=0.02
## Score (logrank) test = 8.46 on 2 df, p=0.01
- The cox.zph function was then applied to the model which resulted in a
p-value of 0.24 reinforcing that the proportional hazards assumption is
maintained. Therefore, we chose to apply the Cox model which is
elaborated on the next slide.
cox1 <-
coxph(
surv_object ~ iinfct + marital + race + os12m + os30d +
rs12m + rs30d + abdpain + discharge + dysuria + condom +
itch + lesion + rash + lymph + vagina + dchexam + abnode +
age + yschool + npartner,
data = std
)
summary(cox1)## Call:
## coxph(formula = surv_object ~ iinfct + marital + race + os12m +
## os30d + rs12m + rs30d + abdpain + discharge + dysuria + condom +
## itch + lesion + rash + lymph + vagina + dchexam + abnode +
## age + yschool + npartner, data = std)
##
## n= 877, number of events= 347
##
## coef exp(coef) se(coef) z Pr(>|z|)
## iinfctchlamydia -0.331853 0.717593 0.149264 -2.223 0.02620 *
## iinfctboth -0.263731 0.768180 0.149262 -1.767 0.07724 .
## maritalMarried 0.054707 1.056231 0.431282 0.127 0.89906
## maritalSingle 0.408462 1.504502 0.295237 1.384 0.16651
## raceWhite -0.112104 0.893951 0.141248 -0.794 0.42739
## os12m1 -0.205985 0.813845 0.212025 -0.972 0.33129
## os30d1 -0.337675 0.713427 0.238454 -1.416 0.15675
## rs12m1 0.036907 1.037596 0.444944 0.083 0.93389
## rs30d1 -0.194243 0.823458 0.565166 -0.344 0.73108
## abdpain1 0.233865 1.263474 0.155288 1.506 0.13207
## discharge1 0.113143 1.119792 0.114148 0.991 0.32159
## dysuria1 0.162934 1.176960 0.155112 1.050 0.29352
## condomnever -0.263793 0.768133 0.119652 -2.205 0.02748 *
## itch1 -0.149871 0.860819 0.154492 -0.970 0.33200
## lesion1 -0.188159 0.828483 0.333106 -0.565 0.57217
## rash1 0.003932 1.003940 0.392693 0.010 0.99201
## lymph1 -0.030839 0.969632 0.547483 -0.056 0.95508
## vagina1 0.349257 1.418013 0.174697 1.999 0.04559 *
## dchexam1 -0.465508 0.627816 0.229914 -2.025 0.04290 *
## abnode1 0.193753 1.213797 0.425224 0.456 0.64864
## age 0.007855 1.007886 0.013891 0.565 0.57176
## yschool -0.127454 0.880334 0.039309 -3.242 0.00119 **
## npartner 0.076185 1.079162 0.054067 1.409 0.15881
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## iinfctchlamydia 0.7176 1.3935 0.5356 0.9615
## iinfctboth 0.7682 1.3018 0.5733 1.0292
## maritalMarried 1.0562 0.9468 0.4536 2.4596
## maritalSingle 1.5045 0.6647 0.8435 2.6835
## raceWhite 0.8940 1.1186 0.6778 1.1791
## os12m1 0.8138 1.2287 0.5371 1.2332
## os30d1 0.7134 1.4017 0.4471 1.1385
## rs12m1 1.0376 0.9638 0.4338 2.4818
## rs30d1 0.8235 1.2144 0.2720 2.4929
## abdpain1 1.2635 0.7915 0.9319 1.7130
## discharge1 1.1198 0.8930 0.8953 1.4006
## dysuria1 1.1770 0.8496 0.8684 1.5951
## condomnever 0.7681 1.3019 0.6076 0.9711
## itch1 0.8608 1.1617 0.6359 1.1652
## lesion1 0.8285 1.2070 0.4313 1.5916
## rash1 1.0039 0.9961 0.4650 2.1675
## lymph1 0.9696 1.0313 0.3316 2.8355
## vagina1 1.4180 0.7052 1.0069 1.9970
## dchexam1 0.6278 1.5928 0.4001 0.9852
## abnode1 1.2138 0.8239 0.5275 2.7932
## age 1.0079 0.9922 0.9808 1.0357
## yschool 0.8803 1.1359 0.8151 0.9508
## npartner 1.0792 0.9266 0.9707 1.1998
##
## Concordance= 0.634 (se = 0.016 )
## Likelihood ratio test= 73.29 on 23 df, p=4e-07
## Wald test = 69.84 on 23 df, p=1e-06
## Score (logrank) test = 71.68 on 23 df, p=7e-07
From here, the goal is to remove extraneous variables and include models that are statistically significant and lower the AIC of the model.
After creating a model with just these variables, condom use became less significant. However, no condom use is much more significant with a p-value of 0.14 versus a p-value of 0.80 for condom use sometimes. We create a model that drops the condom variable, but the AIC increases, suggesting that we should keep it. To see if recategorizing the values would lead to an increase in significance and decrease in AIC, we split condom use into two categories: never and use, where use includes individuals who sometimes use condoms and those who always use condoms. The new p-value for never using condoms was 0.009 and AIC decreased, so we kept the transformed variable in the model.
cox.model = coxph(surv_object ~ iinfct+condom+vagina
+dchexam+yschool, data = std)
summary(cox.model)## Call:
## coxph(formula = surv_object ~ iinfct + condom + vagina + dchexam +
## yschool, data = std)
##
## n= 877, number of events= 347
##
## coef exp(coef) se(coef) z Pr(>|z|)
## iinfctchlamydia -0.38468 0.68067 0.14618 -2.632 0.00850 **
## iinfctboth -0.24681 0.78129 0.14561 -1.695 0.09006 .
## condomnever -0.29779 0.74245 0.11551 -2.578 0.00994 **
## vagina1 0.40660 1.50171 0.16739 2.429 0.01514 *
## dchexam1 -0.37042 0.69044 0.22123 -1.674 0.09405 .
## yschool -0.14357 0.86626 0.03332 -4.308 1.64e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## iinfctchlamydia 0.6807 1.4692 0.5111 0.9065
## iinfctboth 0.7813 1.2799 0.5873 1.0393
## condomnever 0.7425 1.3469 0.5920 0.9311
## vagina1 1.5017 0.6659 1.0817 2.0848
## dchexam1 0.6904 1.4483 0.4475 1.0652
## yschool 0.8663 1.1544 0.8115 0.9247
##
## Concordance= 0.603 (se = 0.017 )
## Likelihood ratio test= 45.32 on 6 df, p=4e-08
## Wald test = 46.51 on 6 df, p=2e-08
## Score (logrank) test = 46.77 on 6 df, p=2e-08
With our reduced model, we still have an overall highly significant model supported by the Liklihood Ratio Test, Wald test, and the Score test.
In the next slide, we will use residuals to address our considerations of stratification and transformation of quantitative variables.
## chisq df p
## iinfct 3.5184 2 0.17
## condom 1.0539 1 0.30
## vagina 1.2835 1 0.26
## dchexam 0.4085 1 0.52
## yschool 0.0214 1 0.88
## GLOBAL 6.1176 6 0.41
Since our standard 45 degree line fits the curve well, we can conclude
that our final model is indeed a good fit.
## Call:
## coxph(formula = surv_object ~ iinfct + condom + vagina + dchexam +
## yschool, data = std)
##
## n= 877, number of events= 347
##
## coef exp(coef) se(coef) z Pr(>|z|)
## iinfctchlamydia -0.38468 0.68067 0.14618 -2.632 0.00850 **
## iinfctboth -0.24681 0.78129 0.14561 -1.695 0.09006 .
## condomnever -0.29779 0.74245 0.11551 -2.578 0.00994 **
## vagina1 0.40660 1.50171 0.16739 2.429 0.01514 *
## dchexam1 -0.37042 0.69044 0.22123 -1.674 0.09405 .
## yschool -0.14357 0.86626 0.03332 -4.308 1.64e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## iinfctchlamydia 0.6807 1.4692 0.5111 0.9065
## iinfctboth 0.7813 1.2799 0.5873 1.0393
## condomnever 0.7425 1.3469 0.5920 0.9311
## vagina1 1.5017 0.6659 1.0817 2.0848
## dchexam1 0.6904 1.4483 0.4475 1.0652
## yschool 0.8663 1.1544 0.8115 0.9247
##
## Concordance= 0.603 (se = 0.017 )
## Likelihood ratio test= 45.32 on 6 df, p=4e-08
## Wald test = 46.51 on 6 df, p=2e-08
## Score (logrank) test = 46.77 on 6 df, p=2e-08
Although there were no obvious outliers for the condom use variable or initial infection type, the other variables had outliers that are summarized below:
Although we already went over the mathematical interpretation of our model, we want to also touch on some intuitive reasoning behind our coefficients. First, we saw earlier that each additional year of schooling leads to a further reduction in risk of reinfection. As individuals progress through the public schooling system, they are progressively taught more about sexual education and diseases. Increased awareness would intuitively lead to a decrease in risk of contraction because people become more aware of the dangers of STDs and how to prevent contracting them.
The model showed that vaginal involvement considerably increased the risk of reinfection. Vaginal involvement leads to an increased susceptibility to infection since there is tissue damage. Hence, when patients experience vaginal involvement they should be made aware that they are at a higher risk of contracting an STD and hence should be especially careful, being sure to discuss STD and testing with their partners.
Counterintuitively, our model suggests that never using condoms actually decreases risk of reinfection. There are a few hypotheses that could be tested with further studies to explain this relationship. First, it may be that those individuals who are not using condoms are only in exclusive relationships and hence have one consistent partner. Alternatively, it may be the case that those people not using condoms are more selective with their partners.
The last variable to discuss is discharge. This variable may require further exploration because we are not given any additional information in terms of the type of discharge. Vaginal discharge is actually a fairly common occurrence and does not necessarily imply that there is an issue. It would be helpful to have more information about the type of discharge and if the patient regularly has vaginal discharge.
The goal of this study was to analyze risk factors for reinfection of gonorrhea and chlamydia in order to inform the population and decrease the spread of STDs. From this model, we found that years of education is the most influential factor in determining risk of reinfection. Therefore, we suggest that sexual education be consistently addressed earlier in school in order to raise awareness and communicate the importance of safe sex practices and STD testing.
Lastly, we noticed in the exploratory data analysis that our population is lacking racial diversity. In further studies, it would be interesting to gather data from a more diverse population, and also gather information about income level to see if there is any correlation between income and risk of infection.